23 research outputs found

    Panorama des travaux en cours dans le domaine des métadonnées

    Get PDF
    L'augmentation spectaculaire des volumes d'information disponibles sur Internet entraîne un intérêt grandissant pour les ``métadonnées'', c'est à dire des données qui décrivent d'autres données pour rendre ces dernières plus accessibles et plus faciles à utiliser. Ce rapport rappelle tout d'abord quelques éléments importants concernant cette notion. Il décrit ensuite une sélection de projets et de services d'information en réseau mettant en oeuvre des métadonnées. Des pistes pour de futures recherches sont suggérées en conclusion

    Vers un formalisme abstrait implémentable pour l'étude savante des textes numérisés

    No full text
    ORLEANS-BU Sciences (452342104) / SudocVILLEURBANNE-ENSSIB-DOUA (692662303) / SudocSudocFranceF

    CoClust: A Python Package for Co-clustering

    Get PDF
    Co-clustering (also known as biclustering), is an important extension of cluster analysis since it allows to simultaneously group objects and features in a matrix, resulting in row and column clusters that are both more accurate and easier to interpret. This paper presents the theory underlying several effective diagonal and non-diagonal co-clustering algorithms, and describes CoClust, a package which provides implementations for these algorithms. The quality of the results produced by the implemented algorithm is demonstrated through extensive tests performed on datasets of various size and balance. CoClust has been designed to complete and easily interface with popular Python Machine Learning libraries such as scikit-learn

    Co-clustering Document-term Matrices by Direct Maximization of Graph Modularity

    No full text
    International audienceWe present Coclus, a novel diagonal co-clustering algorithm which is able to effectively co-cluster binary or contingency matrices by directly maximizing an adapted version of the modularity measure traditionally used for networks. While some effective co-clustering algorithms already exist that use network-related measures (normalized cut, modularity), they do so by using spectral relaxations of the discrete optimization problems. In contrast, Coclus allows to get even better co-clusters by directly maximizing modularity using an iterative alternating optimization procedure. Extensive comparative experiments performed on various document-term datasets demonstrate that our algorithm is very effective, stable and outperforms other co-clustering algorithms
    corecore